Skip to content

fix: guard array_position start_from underflow#22291

Open
Sean-Kenneth-Doherty wants to merge 2 commits into
apache:mainfrom
Sean-Kenneth-Doherty:codex/array-position-min-start
Open

fix: guard array_position start_from underflow#22291
Sean-Kenneth-Doherty wants to merge 2 commits into
apache:mainfrom
Sean-Kenneth-Doherty:codex/array-position-min-start

Conversation

@Sean-Kenneth-Doherty

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

array_position converts the optional 1-indexed start_from argument to a 0-indexed offset by subtracting one. When start_from is i64::MIN, that subtraction can underflow and panic before DataFusion can return a normal error.

What changes are included in this PR?

  • Adds a shared checked conversion for array_position start_from values.
  • Uses it in both scalar and array start_from paths.
  • Adds regression coverage for the i64::MIN boundary.

Are these changes tested?

  • cargo fmt --all
  • TMPDIR=/home/sean/Projects/datafusion-array-position-min/target/tmp cargo test -p datafusion-functions-nested position::tests::test_array_position_start_from_min_value -- --nocapture
  • TMPDIR=/home/sean/Projects/datafusion-array-position-min/target/tmp cargo test --profile=ci --test sqllogictests -- array/array_position.slt
  • TMPDIR=/home/sean/Projects/datafusion-array-position-min/target/tmp cargo clippy --all-targets --all-features -- -D warnings
  • git diff --check

I also manually checked the issue reproducer no longer panics: EXPLAIN SELECT array_position([1], 1, -9223372036854775808); now returns a plan, while plain execution returns Execution error: start_from out of bounds: -9223372036854775808.

Are there any user-facing changes?

Invalid array_position start_from values at the i64::MIN boundary now return a normal DataFusion error instead of panicking.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 17, 2026
@Sean-Kenneth-Doherty

Copy link
Copy Markdown
Contributor Author

Fresh validation on the PR head (862c36ee8):

  • TMPDIR=/home/sean/Projects/datafusion-array-position-min/target/tmp cargo test -p datafusion-functions-nested position::tests::test_array_position_start_from_min_value -- --nocapture -> passed (1 passed)
  • TMPDIR=/home/sean/Projects/datafusion-array-position-min/target/tmp cargo test --profile=ci --test sqllogictests -- array/array_position.slt -> passed (1/1 files completed)
  • cargo fmt --all -- --check -> passed
  • git diff --check origin/main...HEAD -> clean

GitHub Actions only ran the labeler for this PR, so this keeps the regression evidence visible while the branch waits for review.

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use saturating sub instead? It seems we have paths later on validating it isn't negative, but only checking when the row is non-null in the input array:

if validity.is_some_and(|v| v.is_null(i)) {
// Null row -> null output; advance past matches in range
while matches.peek().is_some_and(|&p| p < end) {
matches.next();
}
result.push(None);
continue;
}
let from = arr_from[i];
let row_len = end - start;
if !(from >= 0 && (from as usize) <= row_len) {
return exec_err!("start_from out of bounds: {}", from + 1);
}

for (row, &from) in haystack.iter().zip(arr_from.iter()) {
if !row.is_none_or(|row| from >= 0 && (from as usize) <= row.len()) {
return exec_err!("start_from out of bounds: {}", from + 1);
}
}

Just to keep things consistent 🤔

@Sean-Kenneth-Doherty

Copy link
Copy Markdown
Contributor Author

Updated in 57df3fbb9 to use saturating_sub(1) for start_from normalization, matching the existing later bounds-validation flow.

I also changed the regression to exercise the full array_position path for a non-null row with i64::MIN, so the important behavior remains covered: no arithmetic panic, just a normal start_from out of bounds error.

Validation:

  • cargo fmt --all --check
  • git diff --check
  • CARGO_BUILD_JOBS=2 cargo test -p datafusion-functions-nested position::tests::test_array_position_start_from_min_value -- --nocapture
  • CARGO_BUILD_JOBS=2 cargo test -p datafusion-sqllogictest --test sqllogictests -- array/array_position.slt (1/1 files completed)

}
}

fn normalize_start_from(start_from: i64) -> Result<i64> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just inline this

use datafusion_common::config::ConfigOptions;

#[test]
fn test_array_position_start_from_min_value() -> Result<()> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this test since we already have an SLT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic: array_position underflows start_from at i64::MIN

2 participants